{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## State space models - concentrating the scale out of the likelihood function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import statsmodels.api as sm\n",
    "\n",
    "dta = sm.datasets.macrodata.load_pandas().data\n",
    "dta.index = pd.PeriodIndex(start='1959Q1', end='2009Q3', freq='Q')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Introduction\n",
    "\n",
    "(much of this is based on Harvey (1989); see especially section 3.4)\n",
    "\n",
    "State space models can generically be written as follows (here we focus on time-invariant state space models, but similar results apply also to time-varying models):\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "y_t & = Z \\alpha_t + \\varepsilon_t, \\quad \\varepsilon_t \\sim N(0, H) \\\\\n",
    "\\alpha_{t+1} & = T \\alpha_t + R \\eta_t \\quad \\eta_t \\sim N(0, Q)\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "Often, some or all of the values in the matrices $Z, H, T, R, Q$ are unknown and must be estimated; in statsmodels, estimation is often done by finding the parameters that maximize the likelihood function. In particular, if we collect the parameters in a vector $\\psi$, then each of these matrices can be thought of as functions of those parameters, for example $Z = Z(\\psi)$, etc.\n",
    "\n",
    "Usually, the likelihood function is maximized numerically, for example by applying quasi-Newton \"hill-climbing\" algorithms, and this becomes more and more difficult the more parameters there are. It turns out that in many cases we can reparameterize the model as $[\\psi_*', \\sigma_*^2]'$, where $\\sigma_*^2$ is the \"scale\" of the model (usually, it replaces one of the error variance terms) and it is possible to find the maximum likelihood estimate of $\\sigma_*^2$ analytically, by differentiating the likelihood function. This implies that numerical methods are only required to estimate the parameters $\\psi_*$, which has dimension one less than that of $\\psi$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: local level model\n",
    "\n",
    "(see, for example, section 4.2 of Harvey (1989))\n",
    "\n",
    "As a specific example, consider the local level model, which can be written as:\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "y_t & = \\alpha_t + \\varepsilon_t, \\quad \\varepsilon_t \\sim N(0, \\sigma_\\varepsilon^2) \\\\\n",
    "\\alpha_{t+1} & = \\alpha_t + \\eta_t \\quad \\eta_t \\sim N(0, \\sigma_\\eta^2)\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "In this model, $Z, T,$ and $R$ are all fixed to be equal to $1$, and there are two unknown parameters, so that $\\psi = [\\sigma_\\varepsilon^2, \\sigma_\\eta^2]$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Typical approach\n",
    "\n",
    "First, we show how to define this model without concentrating out the scale, using statsmodels' state space library:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class LocalLevel(sm.tsa.statespace.MLEModel):\n",
    "    _start_params = [1., 1.]\n",
    "    _param_names = ['var.level', 'var.irregular']\n",
    "\n",
    "    def __init__(self, endog):\n",
    "        super(LocalLevel, self).__init__(endog, k_states=1, initialization='diffuse')\n",
    "\n",
    "        self['design', 0, 0] = 1\n",
    "        self['transition', 0, 0] = 1\n",
    "        self['selection', 0, 0] = 1\n",
    "\n",
    "    def transform_params(self, unconstrained):\n",
    "        return unconstrained**2\n",
    "\n",
    "    def untransform_params(self, unconstrained):\n",
    "        return unconstrained**0.5\n",
    "\n",
    "    def update(self, params, **kwargs):\n",
    "        params = super(LocalLevel, self).update(params, **kwargs)\n",
    "\n",
    "        self['state_cov', 0, 0] = params[0]\n",
    "        self['obs_cov', 0, 0] = params[1]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are two parameters in this model that must be chosen: `var.level` $(\\sigma_\\eta^2)$ and `var.irregular` $(\\sigma_\\varepsilon^2)$. We can use the built-in `fit` method to choose them by numerically maximizing the likelihood function.\n",
    "\n",
    "In our example, we are applying the local level model to consumer price index inflation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                           Statespace Model Results                           \n",
      "==============================================================================\n",
      "Dep. Variable:                   infl   No. Observations:                  203\n",
      "Model:                     LocalLevel   Log Likelihood                -457.632\n",
      "Date:                Tue, 24 Dec 2019   AIC                            921.263\n",
      "Time:                        15:04:10   BIC                            931.203\n",
      "Sample:                    03-31-1959   HQIC                           925.285\n",
      "                         - 09-30-2009                                         \n",
      "Covariance Type:                  opg                                         \n",
      "=================================================================================\n",
      "                    coef    std err          z      P>|z|      [0.025      0.975]\n",
      "---------------------------------------------------------------------------------\n",
      "var.level         0.7447      0.156      4.766      0.000       0.438       1.051\n",
      "var.irregular     3.3733      0.315     10.715      0.000       2.756       3.990\n",
      "===================================================================================\n",
      "Ljung-Box (Q):                       41.51   Jarque-Bera (JB):               182.26\n",
      "Prob(Q):                              0.40   Prob(JB):                         0.00\n",
      "Heteroskedasticity (H):               1.75   Skew:                            -1.02\n",
      "Prob(H) (two-sided):                  0.02   Kurtosis:                         7.18\n",
      "===================================================================================\n",
      "\n",
      "Warnings:\n",
      "[1] Covariance matrix calculated using the outer product of gradients (complex-step).\n"
     ]
    }
   ],
   "source": [
    "mod = LocalLevel(dta.infl)\n",
    "res = mod.fit(disp=False)\n",
    "print(res.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can look at the results from the numerical optimizer in the results attribute `mle_retvals`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'fopt': 2.2543435113821833, 'gopt': array([-7.10298487e-06, -9.72852909e-06]), 'fcalls': 27, 'warnflag': 0, 'converged': True, 'iterations': 7}\n"
     ]
    }
   ],
   "source": [
    "print(res.mle_retvals)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Concentrating out the scale"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, there are two ways to reparameterize this model as above:\n",
    "\n",
    "1. The first way is to set $\\sigma_*^2 \\equiv \\sigma_\\varepsilon^2$ so that $\\psi_* = \\psi / \\sigma_\\varepsilon^2 = [1, q_\\eta]$ where $q_\\eta = \\sigma_\\eta^2 / \\sigma_\\varepsilon^2$.\n",
    "2. The second way is to set $\\sigma_*^2 \\equiv \\sigma_\\eta^2$ so that $\\psi_* = \\psi / \\sigma_\\eta^2 = [h, 1]$ where $h = \\sigma_\\varepsilon^2 / \\sigma_\\eta^2$.\n",
    "\n",
    "In the first case, we only need to numerically maximize the likelihood with respect to $q_\\eta$, and in the second case we only need to numerically maximize the likelihood with respect to $h$.\n",
    "\n",
    "Either approach would work well in most cases, and in the example below we will use the second method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To reformulate the model to take advantage of the concentrated likelihood function, we need to write the model in terms of the parameter vector $\\psi_* = [g, 1]$. Because this parameter vector defines $\\sigma_\\eta^2 \\equiv 1$, we now include a new line `self['state_cov', 0, 0] = 1` and the only unknown parameter is $h$. Because our parameter $h$ is no longer a variance, we renamed it here to be `ratio.irregular`.\n",
    "\n",
    "The key piece that is required to formulate the model so that the scale can be computed from the Kalman filter recursions (rather than selected numerically) is setting the flag `self.ssm.filter_concentrated = True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class LocalLevelConcentrated(sm.tsa.statespace.MLEModel):\n",
    "    _start_params = [1.]\n",
    "    _param_names = ['ratio.irregular']\n",
    "\n",
    "    def __init__(self, endog):\n",
    "        super(LocalLevelConcentrated, self).__init__(endog, k_states=1, initialization='diffuse')\n",
    "\n",
    "        self['design', 0, 0] = 1\n",
    "        self['transition', 0, 0] = 1\n",
    "        self['selection', 0, 0] = 1\n",
    "        self['state_cov', 0, 0] = 1\n",
    "        \n",
    "        self.ssm.filter_concentrated = True\n",
    "\n",
    "    def transform_params(self, unconstrained):\n",
    "        return unconstrained**2\n",
    "\n",
    "    def untransform_params(self, unconstrained):\n",
    "        return unconstrained**0.5\n",
    "\n",
    "    def update(self, params, **kwargs):\n",
    "        params = super(LocalLevelConcentrated, self).update(params, **kwargs)\n",
    "        self['obs_cov', 0, 0] = params[0]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, we can use the built-in `fit` method to find the maximum likelihood estimate of $h$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                             Statespace Model Results                             \n",
      "==================================================================================\n",
      "Dep. Variable:                       infl   No. Observations:                  203\n",
      "Model:             LocalLevelConcentrated   Log Likelihood                -457.632\n",
      "Date:                    Tue, 24 Dec 2019   AIC                            921.263\n",
      "Time:                            15:04:10   BIC                            931.203\n",
      "Sample:                        03-31-1959   HQIC                           925.285\n",
      "                             - 09-30-2009   Scale                            0.745\n",
      "Covariance Type:                      opg                                         \n",
      "===================================================================================\n",
      "                      coef    std err          z      P>|z|      [0.025      0.975]\n",
      "-----------------------------------------------------------------------------------\n",
      "ratio.irregular     4.5297      1.226      3.694      0.000       2.126       6.933\n",
      "===================================================================================\n",
      "Ljung-Box (Q):                       41.51   Jarque-Bera (JB):               182.26\n",
      "Prob(Q):                              0.40   Prob(JB):                         0.00\n",
      "Heteroskedasticity (H):               1.75   Skew:                            -1.02\n",
      "Prob(H) (two-sided):                  0.02   Kurtosis:                         7.18\n",
      "===================================================================================\n",
      "\n",
      "Warnings:\n",
      "[1] Covariance matrix calculated using the outer product of gradients (complex-step).\n"
     ]
    }
   ],
   "source": [
    "mod_conc = LocalLevelConcentrated(dta.infl)\n",
    "res_conc = mod_conc.fit(disp=False)\n",
    "print(res_conc.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The estimate of $h$ is provided in the middle table of parameters (`ratio.irregular`), while the estimate of the scale is provided in the upper table. Below, we will show that these estimates are consistent with those from the previous approach."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can again look at the results from the numerical optimizer in the results attribute `mle_retvals`. It turns out that two fewer iterations were required in this case, since there was one fewer parameter to select. Moreover, since the numerical maximization problem was easier, the optimizer was able to find a value that made the gradient for this parameter slightly closer to zero than it was above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'fopt': 2.2543435111703576, 'gopt': array([-6.71906975e-08]), 'fcalls': 12, 'warnflag': 0, 'converged': True, 'iterations': 5}\n"
     ]
    }
   ],
   "source": [
    "print(res_conc.mle_retvals)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Comparing estimates\n",
    "\n",
    "Recall that $h = \\sigma_\\varepsilon^2 / \\sigma_\\eta^2$ and the scale is $\\sigma_*^2 = \\sigma_\\eta^2$. Using these definitions, we can see that both models produce nearly identical results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Original model\n",
      "var.level     = 0.74469\n",
      "var.irregular = 3.37330\n",
      "\n",
      "Concentrated model\n",
      "scale         = 0.74472\n",
      "h * scale     = 3.37338\n"
     ]
    }
   ],
   "source": [
    "print('Original model')\n",
    "print('var.level     = %.5f' % res.params[0])\n",
    "print('var.irregular = %.5f' % res.params[1])\n",
    "\n",
    "print('\\nConcentrated model')\n",
    "print('scale         = %.5f' % res_conc.scale)\n",
    "print('h * scale     = %.5f' % (res_conc.params[0] * res_conc.scale))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: SARIMAX\n",
    "\n",
    "By default in SARIMAX models, the variance term is chosen by numerically maximizing the likelihood function, but an option has been added to allow concentrating the scale out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Typical approach\n",
    "mod_ar = sm.tsa.SARIMAX(dta.cpi, order=(1, 0, 0), trend='ct')\n",
    "res_ar = mod_ar.fit(disp=False)\n",
    "\n",
    "# Estimating the model with the scale concentrated out\n",
    "mod_ar_conc = sm.tsa.SARIMAX(dta.cpi, order=(1, 0, 0), trend='ct', concentrate_scale=True)\n",
    "res_ar_conc = mod_ar_conc.fit(disp=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These two approaches produce about the same loglikelihood and parameters, although the model with the concentrated scale was able to improve the fit very slightly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loglikelihood\n",
      "- Original model:     -245.8275\n",
      "- Concentrated model: -245.8264\n",
      "\n",
      "Parameters\n",
      "- Original model:     0.4921, 0.0243, 0.9808, 0.6490\n",
      "- Concentrated model: 0.4864, 0.0242, 0.9809, 0.6492\n"
     ]
    }
   ],
   "source": [
    "print('Loglikelihood')\n",
    "print('- Original model:     %.4f' % res_ar.llf)\n",
    "print('- Concentrated model: %.4f' % res_ar_conc.llf)\n",
    "\n",
    "print('\\nParameters')\n",
    "print('- Original model:     %.4f, %.4f, %.4f, %.4f' % tuple(res_ar.params))\n",
    "print('- Concentrated model: %.4f, %.4f, %.4f, %.4f' % (tuple(res_ar_conc.params) + (res_ar_conc.scale,)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This time, about 1/3 fewer iterations of the optimizer are required under the concentrated approach:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimizer iterations\n",
      "- Original model:     36\n",
      "- Concentrated model: 22\n"
     ]
    }
   ],
   "source": [
    "print('Optimizer iterations')\n",
    "print('- Original model:     %d' % res_ar.mle_retvals['iterations'])\n",
    "print('- Concentrated model: %d' % res_ar_conc.mle_retvals['iterations'])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
